{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 北京市空气PM2.5值预测\n",
    "\n",
    "## 一、分析目标\n",
    "**通过北京市历史24小时天气数据，预测之后12小时天气数据中PM2.5的值**\n",
    "\n",
    "## 二、项目背景\n",
    "\n",
    "**在北京，冬天最令人头疼的就是雾霾问题，每当雾霾天气来临，那种灰蒙蒙的空气和将口鼻掩盖在厚厚的口罩下呼吸困难的感觉，让人情绪低落。而雾霾的罪魁祸首就是PM2.5。**\n",
    "\n",
    "**本次分析主要是想要使用线性回归模型对PM2.5值进行预测**\n",
    "\n",
    "**昨天已经用（北京）历史24小时即08/07日22时-08/0821时共24小时数据对LinearRegression模型进行了训练，今天先来采集新生成的数据做测试数据集，由于已经过去了12个小时，所以共有12组数据可用**\n",
    "\n",
    "## 三 数据来源\n",
    "\n",
    "**本次分析数据来自心知天气网，该网站可以通过Restful风格URL直接获取Json格式气象和大气数据，获取方式较简单。**\n",
    "\n",
    "## 四 数据分析及预测\n",
    "\n",
    "### 1. 数据规整"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from io import StringIO\n",
    "from urllib import request\n",
    "import json\n",
    "from dateutil.parser import parse"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**从网络获取数据并进行整理**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "url_beijing_w = 'https://api.seniverse.com/v3/weather/hourly_history.json?key=Sz6GmmiQ6SAjYTKbc&location=beijing&language=zh-Hans&unit=c'\n",
    "url_beijing_p = 'https://api.seniverse.com/v3/air/hourly_history.json?key=Sz6GmmiQ6SAjYTKbc&location=beijing&language=zh-Hans&scope=city'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "s_p = request.urlopen(url_beijing_p).read().decode('utf8')\n",
    "s_w = request.urlopen(url_beijing_w).read().decode('utf8')\n",
    "data_dict_p = json.loads(s_p)\n",
    "data_dict_w = json.loads(s_w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gen_table_p(list2):\n",
    "    data_dict = {}\n",
    "    for i, value in enumerate(list2):\n",
    "        data_dict[i] = value['city']\n",
    "    return data_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gen_table_w(list2):\n",
    "    data_dict = {}\n",
    "    for i, value in enumerate(list2):\n",
    "        data_dict[i] = value\n",
    "    return data_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**将气象和大气污染物数据转换成DataFrame表格**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_list_p = data_dict_p['results'][0]['hourly_history']\n",
    "data_list_p = gen_table(data_list_p)\n",
    "data_p = pd.DataFrame(data_list_p)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_list_w = data_dict_w['results'][0]['hourly_history']\n",
    "data_list_w = gen_table_w(data_list_w)\n",
    "data_w = pd.DataFrame(data_list_w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_p = data_p.T\n",
    "data_w = data_w.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**调整时间格式，删除不要的特征变量**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [],
   "source": [
    "def adjust_time(data):\n",
    "    time = data['last_update'].astype(str)\n",
    "    time = time.str[:19]\n",
    "    time = time.str.replace('T', ' ')\n",
    "    time = time.map(lambda x : parse(x))\n",
    "    time = time.dt.strftime('%H-%m/%d')\n",
    "    data['last_update'] = time\n",
    "    return data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_p = adjust_time(data_p)\n",
    "data_w = adjust_time(data_w)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = pd.merge(data_p, data_w, on = 'last_update')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_test = data[:12]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_test = data_test.drop(['dew_point', 'wind_direction', 'wind_direction_degree', 'text', 'code', 'wind_scale','last_update', 'quality'], axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>aqi</th>\n",
       "      <th>co</th>\n",
       "      <th>no2</th>\n",
       "      <th>o3</th>\n",
       "      <th>pm10</th>\n",
       "      <th>pm25</th>\n",
       "      <th>so2</th>\n",
       "      <th>clouds</th>\n",
       "      <th>feels_like</th>\n",
       "      <th>humidity</th>\n",
       "      <th>pressure</th>\n",
       "      <th>temperature</th>\n",
       "      <th>visibility</th>\n",
       "      <th>wind_speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>78</td>\n",
       "      <td>0.817</td>\n",
       "      <td>29</td>\n",
       "      <td>61</td>\n",
       "      <td>61</td>\n",
       "      <td>57</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>28</td>\n",
       "      <td>77</td>\n",
       "      <td>1001</td>\n",
       "      <td>28</td>\n",
       "      <td>3.9</td>\n",
       "      <td>11.16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>80</td>\n",
       "      <td>0.850</td>\n",
       "      <td>29</td>\n",
       "      <td>49</td>\n",
       "      <td>59</td>\n",
       "      <td>59</td>\n",
       "      <td>1</td>\n",
       "      <td>50</td>\n",
       "      <td>27</td>\n",
       "      <td>79</td>\n",
       "      <td>1001</td>\n",
       "      <td>27</td>\n",
       "      <td>3.7</td>\n",
       "      <td>8.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>83</td>\n",
       "      <td>0.842</td>\n",
       "      <td>30</td>\n",
       "      <td>39</td>\n",
       "      <td>62</td>\n",
       "      <td>61</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>26</td>\n",
       "      <td>83</td>\n",
       "      <td>1001</td>\n",
       "      <td>27</td>\n",
       "      <td>3.1</td>\n",
       "      <td>8.64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>84</td>\n",
       "      <td>0.817</td>\n",
       "      <td>28</td>\n",
       "      <td>44</td>\n",
       "      <td>61</td>\n",
       "      <td>62</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>25</td>\n",
       "      <td>85</td>\n",
       "      <td>1001</td>\n",
       "      <td>26</td>\n",
       "      <td>3.1</td>\n",
       "      <td>6.12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>80</td>\n",
       "      <td>0.792</td>\n",
       "      <td>29</td>\n",
       "      <td>49</td>\n",
       "      <td>62</td>\n",
       "      <td>59</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>25</td>\n",
       "      <td>86</td>\n",
       "      <td>1001</td>\n",
       "      <td>25</td>\n",
       "      <td>3.1</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>80</td>\n",
       "      <td>0.775</td>\n",
       "      <td>28</td>\n",
       "      <td>54</td>\n",
       "      <td>65</td>\n",
       "      <td>59</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>26</td>\n",
       "      <td>82</td>\n",
       "      <td>1001</td>\n",
       "      <td>26</td>\n",
       "      <td>3.5</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>75</td>\n",
       "      <td>0.767</td>\n",
       "      <td>27</td>\n",
       "      <td>61</td>\n",
       "      <td>64</td>\n",
       "      <td>55</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>26</td>\n",
       "      <td>79</td>\n",
       "      <td>1001</td>\n",
       "      <td>27</td>\n",
       "      <td>4.6</td>\n",
       "      <td>9.72</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>69</td>\n",
       "      <td>0.808</td>\n",
       "      <td>25</td>\n",
       "      <td>73</td>\n",
       "      <td>65</td>\n",
       "      <td>50</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>27</td>\n",
       "      <td>79</td>\n",
       "      <td>1002</td>\n",
       "      <td>27</td>\n",
       "      <td>4.3</td>\n",
       "      <td>9.36</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>67</td>\n",
       "      <td>0.817</td>\n",
       "      <td>25</td>\n",
       "      <td>81</td>\n",
       "      <td>69</td>\n",
       "      <td>48</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>27</td>\n",
       "      <td>79</td>\n",
       "      <td>1002</td>\n",
       "      <td>27</td>\n",
       "      <td>4.1</td>\n",
       "      <td>9.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>62</td>\n",
       "      <td>0.767</td>\n",
       "      <td>26</td>\n",
       "      <td>88</td>\n",
       "      <td>67</td>\n",
       "      <td>44</td>\n",
       "      <td>2</td>\n",
       "      <td>50</td>\n",
       "      <td>27</td>\n",
       "      <td>79</td>\n",
       "      <td>1002</td>\n",
       "      <td>27</td>\n",
       "      <td>5.1</td>\n",
       "      <td>7.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>59</td>\n",
       "      <td>0.758</td>\n",
       "      <td>25</td>\n",
       "      <td>103</td>\n",
       "      <td>60</td>\n",
       "      <td>42</td>\n",
       "      <td>2</td>\n",
       "      <td>90</td>\n",
       "      <td>28</td>\n",
       "      <td>78</td>\n",
       "      <td>1002</td>\n",
       "      <td>28</td>\n",
       "      <td>6.2</td>\n",
       "      <td>6.48</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>56</td>\n",
       "      <td>0.733</td>\n",
       "      <td>24</td>\n",
       "      <td>114</td>\n",
       "      <td>61</td>\n",
       "      <td>39</td>\n",
       "      <td>2</td>\n",
       "      <td>90</td>\n",
       "      <td>28</td>\n",
       "      <td>74</td>\n",
       "      <td>1002</td>\n",
       "      <td>28</td>\n",
       "      <td>8.3</td>\n",
       "      <td>10.08</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   aqi     co no2   o3 pm10 pm25 so2 clouds feels_like humidity pressure  \\\n",
       "0   78  0.817  29   61   61   57   1     50         28       77     1001   \n",
       "1   80  0.850  29   49   59   59   1     50         27       79     1001   \n",
       "2   83  0.842  30   39   62   61   2     50         26       83     1001   \n",
       "3   84  0.817  28   44   61   62   2     50         25       85     1001   \n",
       "4   80  0.792  29   49   62   59   2     50         25       86     1001   \n",
       "5   80  0.775  28   54   65   59   2     50         26       82     1001   \n",
       "6   75  0.767  27   61   64   55   2     50         26       79     1001   \n",
       "7   69  0.808  25   73   65   50   2     50         27       79     1002   \n",
       "8   67  0.817  25   81   69   48   2     50         27       79     1002   \n",
       "9   62  0.767  26   88   67   44   2     50         27       79     1002   \n",
       "10  59  0.758  25  103   60   42   2     90         28       78     1002   \n",
       "11  56  0.733  24  114   61   39   2     90         28       74     1002   \n",
       "\n",
       "   temperature visibility wind_speed  \n",
       "0           28        3.9      11.16  \n",
       "1           27        3.7       8.64  \n",
       "2           27        3.1       8.64  \n",
       "3           26        3.1       6.12  \n",
       "4           25        3.1        9.0  \n",
       "5           26        3.5        9.0  \n",
       "6           27        4.6       9.72  \n",
       "7           27        4.3       9.36  \n",
       "8           27        4.1        9.0  \n",
       "9           27        5.1       7.56  \n",
       "10          28        6.2       6.48  \n",
       "11          28        8.3      10.08  "
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.生成训练数据集合测试数据集\n",
    "**数据备份，生成训练数据集和测试数据集**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_test.to_excel('D:/python/practise/sample/weather/data_test(22-09).xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_train = pd.read_excel('D:/python/practise/sample/weather/data_all(22-21).xlsx')\n",
    "data_test = pd.read_excel('D:/python/practise/sample/weather/data_test(22-09).xlsx')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [],
   "source": [
    "data_train = data_train.drop(['Unnamed: 0', 'last_update', 'station'], axis = 1)\n",
    "data_test = data_test.drop('Unnamed: 0', axis = 1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train = data_train['pm25'].values\n",
    "x_train = data_train.drop('pm25', axis = 1).values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**把PM2.5提取出来做真实值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_true = data_test['pm25'].values\n",
    "x_test = data_test.drop('pm25', axis = 1).values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**先用普通线性回归模型预测**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import LinearRegression"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = LinearRegression()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None,\n",
       "         normalize=False)"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_esti = model.predict(x_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**这是预测结果值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([42.58293092, 44.55257971, 43.7785975 , 42.5218313 , 39.48083543,\n",
       "       41.79761404, 39.50294654, 41.746076  , 42.1306705 , 39.09266014,\n",
       "       34.90946585, 32.29064508])"
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_esti"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**这是真实值**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([57, 59, 61, 62, 59, 59, 55, 50, 48, 44, 42, 39], dtype=int64)"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "y_true"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**用残差平方和查看预测效果，公式如下：**$$score = \\frac{1}{n}\\sum_1^n(y_i-y^*)^2$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "185.9652934007928"
      ]
     },
     "execution_count": 97,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((y_esti - y_true)**2).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**再来试试岭回归模型**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 121,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import Ridge\n",
    "from sklearn.linear_model import RidgeCV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 116,
   "metadata": {},
   "outputs": [],
   "source": [
    "reg = Ridge(alpha = 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Ridge(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=None,\n",
       "   normalize=False, random_state=None, solver='auto', tol=0.001)"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_esti_reg = reg.predict(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 119,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "186.24165721215715"
      ]
     },
     "execution_count": 119,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((y_esti_reg - y_true)**2).mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [],
   "source": [
    "reg_cv = RidgeCV(alphas=[0.1, 1.0, 10.0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 123,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RidgeCV(alphas=array([ 0.1,  1. , 10. ]), cv=None, fit_intercept=True,\n",
       "    gcv_mode=None, normalize=False, scoring=None, store_cv_values=False)"
      ]
     },
     "execution_count": 123,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reg_cv.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_esti_regcv = reg_cv.predict(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 125,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "188.67034918027025"
      ]
     },
     "execution_count": 125,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((y_esti_regcv - y_true)**2).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**再用Lasso回归试试**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.linear_model import Lasso"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [],
   "source": [
    "lasso = Lasso(alpha = 0.01)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Lasso(alpha=0.01, copy_X=True, fit_intercept=True, max_iter=1000,\n",
       "   normalize=False, positive=False, precompute=False, random_state=None,\n",
       "   selection='cyclic', tol=0.0001, warm_start=False)"
      ]
     },
     "execution_count": 140,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lasso.fit(x_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_esti_lasso = lasso.predict(x_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "186.51258366387617"
      ]
     },
     "execution_count": 142,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "((y_esti_lasso - y_true)**2).mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 五、结论\n",
    "**预测值有一定偏差，但基本反映了变化趋势。进过几个模型的筛选，最终还是普通线性回归模型效果稍好。**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
